I was trying to implement page rank in hadoop. I created a shell script to iteratively run map-reduce. But the while loop just doesn't work. I have 2 map-reduce, one to find the initial page rank and to print the adjacency list. The other one will take the output of the first reducer and take that as input to the second mapper.
The shell script
#!/bin/sh
CONVERGE=1
ITER=1
rm W.txt W1.txt log*
$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave
hdfs dfs -rm -r /task-*
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.3.jar \-mapper "'$PWD/mapper.py'" \-reducer "'$PWD/reducer.py' '$PWD/W.txt'" \-input /assignment2/task2/web-Google.txt \-output /task-1-output
echo "HERE $CONVERGE"
while [ "$CONVERGE" -ne 0 ]
do
echo "############################# ITERATION $ITER #############################"
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.3.jar \-mapper "'$PWD/mapper2.py' '$PWD/W.txt' '$PWD/page_embeddings.json'" \-reducer "'$PWD/reducer2.py'" \-input task-1-output/part-00000 \-output /task-2-output
touch w1
hadoop dfs -cat /task-2-output/part-00000 > "$PWD/w1"
CONVERGE=$(python3 $PWD/check_conv.py $ITER>&1)
ITER=$((ITER+1))
hdfs dfs -rm -r /task-2-output/x
echo $CONVERGE
done
The first mapper runs perfectly fine and I am getting output for it. The condition for while loop [ '$CONVERGE' -ne 0 ] just gives false so it doesn't enter the while loop to run 2nd map-reduce. I removed the quotes on $CONVERGE and tried it still doesn't work.
I defined CONVERGE at the beginning of the file and is updated in while loop with the output of check.py. The while loop just doesn't run.
What could I be doing wrong?